cient Scheduling of Nested ParallelismGIRIJA
نویسنده
چکیده
Many of today's high level parallel languages support dynamic, ne-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an eecient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This paper presents an on-line scheduling algorithm that is provably space-eecient and time-eecient for nested parallel languages. For a computation with depth D and serial space requirement S 1 , the algorithm generates a schedule that requires at most S 1 + O(p D) space (including scheduler space) on p processors. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time eecient in theory, we demonstrate that it is eeective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm signiicantly reduces memory usage compared to previous techniques, without compromising performance.
منابع مشابه
Software Pipelining for Nested Loops
In this paper, we present a novel framework of software pipelining for nested loops. Under this framework, a periodic scheduling function, called r-periodic schedule, is associated with each operation of the loop body in the entire iteration space. We present a simple problem formulation as well as e cient solutions which gives provable asymptotically time-optimal schedule for nested loops unde...
متن کاملExecuting Nested Parallel Loops on Shared-Memory Multiprocessors
Cache-coherent, bus-based shared-memory multiprocessors are a cost-e ective platform for parallel processing. In scienti c parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...
متن کاملChain-Based Scheduling: Part I { Loop Transformations and Code Generation
Chain-based scheduling [1] is an e cient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers [1,2,3] are graph algorithms on...
متن کاملAutomatic Data Distribution for Nearest Neighbor Networks
An algorithm for mapping an arbitrary multidimen sional array onto an arbitrarily shaped multidimen sional nearest neighbor network of a distributed mem ory machine is presented The individual dimensions of the array are labeled with high level usage descrip tors that can either be provided by the programmer or can be derived by sophisticated static compiler analy sis The presented algorithm ac...
متن کاملA heuristic approach for multi-stage sequence-dependent group scheduling problems
We present several heuristic algorithms based on tabu search for solving the multi-stage sequence-dependent group scheduling (SDGS) problem by considering minimization of makespan as the criterion. As the problem is recognized to be strongly NP-hard, several meta (tabu) search-based solution algorithms are developed to efficiently solve industry-size problem instances. Also, two different initi...
متن کامل